The E ects of Randomly Sampled Training Data on
نویسنده
چکیده
The eeects of randomly sampled training data on genetic programming performance is empirically investigated. Often the most natural, if not only, means of characterizing the target behaviour for a problem is to randomly sample training cases inherent to that problem. A natural question to raise about this strategy is, how deleterious is the randomly sampling of training data to evolution performance? Will sampling reduce the evolutionary search to hill climbing? Can re-sampling during the run be advantageous? We address these questions by undertaking a suite of diierent GP experiments. Parameters include various sampling strategies (single, re-sampling, ideal samples), genera-tional and steady{state evolution, and non{ evolutionary strategies such as hill climbing and random search. The experiments connrm that random sampling eeectively characterizes stochastic domains during genetic programming , provided that a suuciently representative sample is used. An unexpected result is that genetic programming may perform worse than random search when the sampled training sets are exceptionally poor. We conjecture that poor training sets cause evolution to prematurely converge to undesirable optima, which irrevocably handicaps the population's diversity and viability.
منابع مشابه
The Effects of Randomly Sampled Training Data on Program Evolution
The e ects of randomly sampled training data on genetic programming performance is empirically investigated. Often the most natural, if not only, means of characterizing the target behaviour for a problem is to randomly sample training cases inherent to that problem. A natural question to raise about this strategy is, how deleterious is the randomly sampling of training data to evolution perfor...
متن کاملSemiparametric Bayes analysis of longitudinal data treatment models
This paper is concerned with the problem of determining the e*ect of a binary treatment variable on a continuous outcome given longitudinal observational data and non-randomly assigned treatments. A general semiparametric Bayesian model (based on Dirichlet process mixing) is developed which contains potential outcomes and subject level outcome-speci0c random e*ects. The model is subjected to a ...
متن کاملThe Eeects of Randomly Sampled Training Data on Program Evolution the Eeects of Randomly Sampled Training Data on Program Evolution
The eeects of randomly sampled training data during genetic programming is empirically investigated. Sometimes the most natural , if not only, means of characterizing the target behaviour for some problems is to randomly sample training cases inherent to the problems in question. A natural question to raise about this strategy is, how delete-rious is the randomly sampling of training data to ev...
متن کاملThe E ects of Training Set Size on Decision Tree Complexity
This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when that additional complexity results in no signiicant increase in classiication accuracy. Said diierently, r e m o ving randomly selected training instances often results in trees that are substantially small...
متن کاملThe Effect of Endurance Training Along with Curcumin on VEGF-A Level and VEGFR Gene Expression in Cancer Tissue of Female Mice with Breast Cancer
Introduction: Breast cancer is the most common cancer and leading cause of death among women worldwide. The aim of the present study was to determine synergistic effects of 5 weeks of endurance training along with curcumin on cancer progression, levels of VEGF-A, and gene expression of VEGFR in cancer tissue of female Mice with breast cancer. Methods: The present study was an experimental study...
متن کامل